Background: Dust entering the Sierras comes from both regional and global sources – and the contribution of each source changes over time. There is evidence that microbial communities can differ based on dust source and timing of dust deflation, and that this can have subsequent effects on endophytic associations (Caroline Frank’s work) and soil microbial communities at the deposition location. We therefore wanted to know:
Limitation: our time comparisons may be skewed because we did not sterilize collectors in between sampling events. Time points may appear more similar to each other than in reality. It’s also important to run on presence-absence data only; we don’t want to capture community differences that are driven by post-depositional divergence in relative abundances.
# wd is set to /RdataDust/DustMS_Rmarkdown which contains this file and all other linked files (or should)
getwd()
## [1] "/Users/colleennell/Dropbox/rstats/consulting/Mia/RdataDust/DustMS_Rmarkdown"
# i made a folder in this directory named data where I left it
list.files('data')
## [1] "DustFungi.guilds.txt" "DustFungi.otu_table.taxonomy.txt"
## [3] "DustFungi.otu_table.txt" "filtered_table_w_metadata.csv"
## [5] "filtered_table_w_metadata.txt" "filtered_table_w_metadata.xlsx"
## [7] "SierraMap6.csv" "SierraMap6.xlsx"
## [9] "unweighted_unifrac_dm.txt" "weighted_unifrac_dm.txt"
map6<-read.csv("data/SierraMap6.csv", header = TRUE, col.names = c("SampleID","BarcodeSequence","LinkerPrimerSequence","Year","Month","SiteCode","RepNum","SiteRep","Site","Elevation","DateCode","DescName","Description"))
# print df
map6
# I don't have the same files referenced in DustAnnotated1.1
B16S<-read.csv("data/filtered_table_w_metadata.csv", header = TRUE, stringsAsFactors = FALSE)
B16S
# split taxa groupings into new columns
source('dust_functions.R')
splitdf<-split_taxa(B16S)
head(splitdf)
list.files('data')
## [1] "DustFungi.guilds.txt" "DustFungi.otu_table.taxonomy.txt"
## [3] "DustFungi.otu_table.txt" "filtered_table_w_metadata.csv"
## [5] "filtered_table_w_metadata.txt" "filtered_table_w_metadata.xlsx"
## [7] "SierraMap6.csv" "SierraMap6.xlsx"
## [9] "unweighted_unifrac_dm.txt" "weighted_unifrac_dm.txt"
unifrac<-read.table('data/unweighted_unifrac_dm.txt')
unifrac_wt<-read.table('data/weighted_unifrac_dm.txt')
#rows and col reflect pairwise distances?
str(unifrac)
## 'data.frame': 31 obs. of 31 variables:
## $ OS115: num 0 0.802 0.811 0.733 0.842 ...
## $ AP114: num 0.802 0 0.7 0.787 0.687 ...
## $ JJ415: num 0.811 0.7 0 0.748 0.566 ...
## $ OT115: num 0.733 0.787 0.748 0 0.788 ...
## $ JJ515: num 0.842 0.687 0.566 0.788 0 ...
## $ OS615: num 0.746 0.859 0.844 0.777 0.878 ...
## $ OJ615: num 0.779 0.711 0.627 0.714 0.659 ...
## $ JT415: num 0.769 0.669 0.618 0.73 0.646 ...
## $ JP315: num 0.698 0.768 0.784 0.734 0.827 ...
## $ OJ514: num 0.789 0.646 0.598 0.735 0.619 ...
## $ OJ515: num 0.793 0.757 0.65 0.693 0.691 ...
## $ OJ415: num 0.764 0.753 0.657 0.69 0.707 ...
## $ AS114: num 0.738 0.731 0.743 0.748 0.773 ...
## $ JS415: num 0.753 0.876 0.858 0.811 0.901 ...
## $ JP414: num 0.746 0.681 0.733 0.759 0.768 ...
## $ OT515: num 0.741 0.803 0.782 0.621 0.817 ...
## $ JJ114: num 0.83 0.671 0.597 0.789 0.565 ...
## $ OT415: num 0.74 0.808 0.774 0.613 0.819 ...
## $ OT414: num 0.736 0.683 0.69 0.722 0.725 ...
## $ JP515: num 0.699 0.766 0.79 0.732 0.836 ...
## $ JT115: num 0.77 0.652 0.624 0.743 0.635 ...
## $ AJ414: num 0.705 0.813 0.8 0.745 0.841 ...
## $ OP315: num 0.697 0.811 0.806 0.74 0.853 ...
## $ OS214: num 0.74 0.744 0.765 0.763 0.791 ...
## $ JT314: num 0.772 0.668 0.685 0.755 0.684 ...
## $ OP515: num 0.702 0.798 0.797 0.733 0.839 ...
## $ AT414: num 0.791 0.689 0.683 0.738 0.706 ...
## $ OS415: num 0.582 0.786 0.801 0.734 0.831 ...
## $ OP615: num 0.682 0.79 0.812 0.74 0.84 ...
## $ OP614: num 0.743 0.81 0.81 0.767 0.85 ...
## $ JS115: num 0.667 0.821 0.818 0.774 0.864 ...
## plot otu unifrac dissimilarities as heatmap
# first melt df so there is a new row for every pairwise combo
unifrac.melt<-unifrac%>%melt(variable.name='otu_1')%>%mutate(otu_2 = rep.int(colnames(unifrac), times=length(colnames(unifrac))))
# dark color indicates samples that were more similar to one another based on unifrac dissimilarity, blue = contains no similarities
##reorder axis so grouped by more similar samples
ggplot(data = unifrac.melt, aes(x = reorder(otu_1, value), y = reorder(otu_2, value)))+
geom_tile(aes(fill = value))+
scale_fill_gradient2(low = 'midnightblue', mid='deepskyblue3', high='yellow', midpoint = .5)+
theme(axis.text.x = element_text(angle=90))+
labs(x='Sample', y='Sample')
# make the same plot for weighted unifrac, compare
Note: Can’t run any analyses that rely on relative abundance data (e.g., simper), Address using absence-presence data
This procedure identifies OTUs as indicator species independently from their abundance in the total data set. ####Steps (from https://www.nature.com/articles/ismej2015238):
it teruherfgbwegrgterfghgfhgf